-
Notifications
You must be signed in to change notification settings - Fork 125
🎨 Markdown formatting improvements #2245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: dd786b9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 6 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
rowanc1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting changes below. We need to get the parsing data onto the mystDirective/Role nodes, so that we have an easier time in formatting. Would love some thoughts on direction here.
| directives: [cardDirective], | ||
| }); | ||
| expect(output).toEqual(expected); | ||
| expect(output).toMatchObject(expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests in the extension code are made a bit less strict here. No changes to the code, so no need to bump the packages.
| if (argSpec.required && data.arg == null) { | ||
| validationError = true; | ||
| } | ||
| node.args = data.arg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the biggest change is that we put the arg, options and body of roles and directives onto the mystDirective node. This allows us to process them afterward in a formatting package - or transform a figure into a directive etc. rather than doing that in the markdown writer.
This shouldn't matter for the duplication as we remove these nodes early in the processing for most things. But it is really helpful to have them for the formatting / writing of markdown.
We might want to change it to node.argsParsed or something so we have the pre/post parsing? Or put it on the data field.
@fwkoch or @agoose77 if you have opinions here, that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - seeing this, my first impression was "we already stash this stuff on the mystDirective nodes, don't we...?" - That's not entirely true.
On the initial parse pass, we do save args and options to the mystDirective node; these are the raw, unparsed values. I don't think these fields are used anywhere. Additionally, we create mystDirectiveArg/Option/Body nodes, where the raw value is saved to be parsed, in this applyDirectives function. Within this function we are building a data object of type DirectiveData which contains all the parsed arg/option/body content. Critically, this data object is transient; it's just passed directly to the directive plugins.
So, the change in this PR is taking the parsed, previously transient arg/option/body values and saving them directly to the mystDirective node. I agree this is a good change.
Getting back to your specific question @rowanc1 - are we putting this in the right place? The biggest issue is that we are clobbering raw arg/option values with parsed arg/option values. I think this is fine; we really should only ever be dealing with the parsed values. The only argument I can think of for keeping the raw values is for round-tripping documents exactly as written. But that's not useful at all - MyST does not care about this (I don't think!), we just don't want to lose any data, and for that, parsed structured data is better. (This is especially true if we are thinking about converting between formats - in that case, any raw value is totally useless.)
I guess the other argument against clobbering the raw values with parsed values is simply backwards compatibility. This is a backwards incompatible change that I don't think will impact anyone, but there could be something somewhere using these raw mystDirective fields. (Maybe we make this a minor version bump?)
Regarding the alternatives you suggest:
- I don't think we should do
argsParsed... that feels excessive; we are keeping track ifnode.processed === false, so if that's not the case (i.e. the node is processed),argswill be parsed. It's also a little silly withbodyParsedsince we are not saving the rawbody... - Stashing
args/options/bodyonnode.datawould be fine - this is just taking theDirectiveDataobject and saving it as "directive data." I somewhat like that, but at the same time, thedatafield is usually used formetadata- in the case ofmystDirectivenodes, args/options/body are the fundamental content, not metadata. 🤷
So maybe it's just fine as-implemented? (We could potentially even stop creating the data object, and pass the node as DirectiveData to the run function... But I think for now this is overthinking it...)
TL;DR: 👍 to your change, maybe minor version bump.
1599ced to
dd786b9
Compare
fwkoch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I responded to your question @rowanc1 - but after getting to understand this PR a little more, I think we need slight clarity on the direction:
The tests for these improvements, especially around directive/role handling, are focused on a narrow scoped pipeline that only includes myst-parser and myst-to-md. It ignores all the transforms and additional processing included in the full myst-cli pipeline.
For me to feel comfortable with these changes we need to do one of three things:
- Clarify the scope of this PR is isolated to standalone usage of
myst-parserandmyst-to-md; we should not expect to see the same improvements when using the fullmyst-clipipeline. - Modify the
myst-clipipeline for md export to perform fewer transforms (including no lifting ofmystDirectivenodes, no propagatingmystTargets, etc). - Refactor the changes here so they provide the same improvements to post-processed
myst-climdast as they do to unprocessedmyst-parsermdast.
Happy to discuss all this with you more!
| if (argSpec.required && data.arg == null) { | ||
| validationError = true; | ||
| } | ||
| node.args = data.arg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - seeing this, my first impression was "we already stash this stuff on the mystDirective nodes, don't we...?" - That's not entirely true.
On the initial parse pass, we do save args and options to the mystDirective node; these are the raw, unparsed values. I don't think these fields are used anywhere. Additionally, we create mystDirectiveArg/Option/Body nodes, where the raw value is saved to be parsed, in this applyDirectives function. Within this function we are building a data object of type DirectiveData which contains all the parsed arg/option/body content. Critically, this data object is transient; it's just passed directly to the directive plugins.
So, the change in this PR is taking the parsed, previously transient arg/option/body values and saving them directly to the mystDirective node. I agree this is a good change.
Getting back to your specific question @rowanc1 - are we putting this in the right place? The biggest issue is that we are clobbering raw arg/option values with parsed arg/option values. I think this is fine; we really should only ever be dealing with the parsed values. The only argument I can think of for keeping the raw values is for round-tripping documents exactly as written. But that's not useful at all - MyST does not care about this (I don't think!), we just don't want to lose any data, and for that, parsed structured data is better. (This is especially true if we are thinking about converting between formats - in that case, any raw value is totally useless.)
I guess the other argument against clobbering the raw values with parsed values is simply backwards compatibility. This is a backwards incompatible change that I don't think will impact anyone, but there could be something somewhere using these raw mystDirective fields. (Maybe we make this a minor version bump?)
Regarding the alternatives you suggest:
- I don't think we should do
argsParsed... that feels excessive; we are keeping track ifnode.processed === false, so if that's not the case (i.e. the node is processed),argswill be parsed. It's also a little silly withbodyParsedsince we are not saving the rawbody... - Stashing
args/options/bodyonnode.datawould be fine - this is just taking theDirectiveDataobject and saving it as "directive data." I somewhat like that, but at the same time, thedatafield is usually used formetadata- in the case ofmystDirectivenodes, args/options/body are the fundamental content, not metadata. 🤷
So maybe it's just fine as-implemented? (We could potentially even stop creating the data object, and pass the node as DirectiveData to the run function... But I think for now this is overthinking it...)
TL;DR: 👍 to your change, maybe minor version bump.
| .join(' ') | ||
| : ''; | ||
| const optOther = Object.entries(otherOptions) | ||
| .map(([key, value]) => `${key}=${value}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing all options this way feels risky. Should we check if value has spaces?
| const marker = markerChar.repeat(markerLength); | ||
| const directiveInline = [node.name, directiveOpts].filter(Boolean).join(' '); | ||
| const args = Array.isArray(node.args) | ||
| ? state.containerPhrasing({ type: 'heading', depth: 1, children: node.args }, info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this - Are we injecting this heading just to prepend #...? I don't think we always want to do that? But I might just be completely misunderstanding!
| ````{abc} | ||
| ```` | ||
| print("hello world") | ||
| value: Ax=b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this value was simplified? I guess maybe backticks no longer matter so much in math since we are using $$ instead of math directive.
But what about the equivalent issue - what if there is a $$ inside the math value...? Are we handling that correctly?
| :class: my-class | ||
| :name: my-note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test is quite correct. By the time we have a mystDirective node, I think these options should already be stripped off into a separate field. (Now the test is just treating these as part of the body)
| return writeStaticRole('math')(node, _, state); | ||
| } | ||
| const tracker = state.createTracker(info); | ||
| return tracker.move(`$${node.value}$`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have two $ at the beginning?
| */ | ||
| function mystDirective(node: any, _: Parent, state: NestedState, info: Info): string { | ||
| return writeStaticDirective(node.name, { argsKey: 'args' })(node, _, state, info); | ||
| function writeDirective(options?: DirectiveOptions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm feeling a little confused about this.
First, I don't think we need this wrapper "factory function" - options is unused. We can just define the function directly.
Second, we are only using writeDirective when we encounter mystDirective nodes. These only persist in the tree when they have an unknown directive type. In that case, in the applyDirectives function, we are deleting the children and writing the options back to the body. It never even gets to the new code where node.args/node.options/node.body are set. For known directives, the mystDirective nodes get lifted out.
| (my-label)= | ||
| # Heading | ||
| :::{admonition #my-label .dropdown} Admonition title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm extra confused why this works - it's a known directive type but it's using the new writeDirective code, which only applies to mystDirective nodes, not admonition nodes.
| const mdast = mystParse(markdown, { mdast: { hoistSingleImagesOutofParagraphs: false } }); | ||
| const file = unified() | ||
| .use(mystToMd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh... now I understand. This roundtrip test is checking the myst-parse -> myst-to-md roundtrip. It is not using any of the myst-cli pipeline. Therefore, it's skipping all the basic transforms which are just applied by default in myst-cli.
If we were to try to round-trip via myst-cli, I don't think we'd see the same results.
This improves the markdown formatting output.